1. Importing the libraries:¶
In [1]:
import pandas as pd
import numpy as np
import folium
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.io as pio
pio.templates.default = 'plotly_white'
import warnings
warnings.filterwarnings("ignore")
2. Reading the dataset:¶
In [2]:
metro_data = pd.read_csv("Delhi-Metro-Network.csv")
In [3]:
metro_data.head(2)
Out[3]:
| Station ID | Station Name | Distance from Start (km) | Line | Opening Date | Station Layout | Latitude | Longitude | |
|---|---|---|---|---|---|---|---|---|
| 0 | 1 | Jhil Mil | 10.3 | Red line | 2008-04-06 | Elevated | 28.67579 | 77.31239 |
| 1 | 2 | Welcome [Conn: Red] | 46.8 | Pink line | 2018-10-31 | Elevated | 28.67180 | 77.27756 |
3. Performing EDA:¶
In [4]:
metro_data.describe()
Out[4]:
| Station ID | Distance from Start (km) | Latitude | Longitude | |
|---|---|---|---|---|
| count | 285.000000 | 285.000000 | 285.000000 | 285.000000 |
| mean | 143.000000 | 19.218947 | 28.595428 | 77.029315 |
| std | 82.416625 | 14.002862 | 0.091316 | 2.875400 |
| min | 1.000000 | 0.000000 | 27.920862 | 28.698807 |
| 25% | 72.000000 | 7.300000 | 28.545828 | 77.107130 |
| 50% | 143.000000 | 17.400000 | 28.613453 | 77.207220 |
| 75% | 214.000000 | 28.800000 | 28.666360 | 77.281165 |
| max | 285.000000 | 52.700000 | 28.878965 | 77.554479 |
3.1 Checking the NULL Values:¶
In [5]:
metro_data.isna().sum()
Out[5]:
Station ID 0 Station Name 0 Distance from Start (km) 0 Line 0 Opening Date 0 Station Layout 0 Latitude 0 Longitude 0 dtype: int64
3.2 Checking the data types:¶
In [6]:
metro_data.dtypes
Out[6]:
Station ID int64 Station Name object Distance from Start (km) float64 Line object Opening Date object Station Layout object Latitude float64 Longitude float64 dtype: object
The opening date column is represented as an object, so let's convert it to datetime object:¶
In [7]:
metro_data.head(2)
Out[7]:
| Station ID | Station Name | Distance from Start (km) | Line | Opening Date | Station Layout | Latitude | Longitude | |
|---|---|---|---|---|---|---|---|---|
| 0 | 1 | Jhil Mil | 10.3 | Red line | 2008-04-06 | Elevated | 28.67579 | 77.31239 |
| 1 | 2 | Welcome [Conn: Red] | 46.8 | Pink line | 2018-10-31 | Elevated | 28.67180 | 77.27756 |
In [8]:
metro_data['Opening Date'] = pd.to_datetime(metro_data['Opening Date'])
In [9]:
metro_data.dtypes
Out[9]:
Station ID int64 Station Name object Distance from Start (km) float64 Line object Opening Date datetime64[ns] Station Layout object Latitude float64 Longitude float64 dtype: object
There are no any NULL values, and the data type for Opening Date column has been changed to datetime format, now, let's start the analysis.
4. Geo-Spatial Analysis:¶
In [10]:
metro_data['Line'].value_counts()
Out[10]:
Blue line 49 Pink line 38 Yellow line 37 Voilet line 34 Red line 29 Magenta line 25 Aqua line 21 Green line 21 Rapid Metro 11 Blue line branch 8 Orange line 6 Gray line 3 Green line branch 3 Name: Line, dtype: int64
In [11]:
metro_data['Line'].value_counts().sum()
Out[11]:
285
In [12]:
metro_data.shape
Out[12]:
(285, 8)
In [13]:
print(f"In total we have: {metro_data['Line'].nunique()} unique lines, "
"so, let's assign a unique color to them so that we can plot it.")
In total we have: 13 unique lines, so, let's assign a unique color to them so that we can plot it.
4.1 Assigning unique colors to each of the railway lines:¶
In [14]:
line_colors = {
"Blue line": "blue",
"Pink line": "pink",
"Yellow line": "yellow",
"Voilet line": "purple",
"Red line": "red",
"Magenta line": "black",
"Aqua line": "lightblue",
"Green line": "green",
"Rapid Metro": "cadetblue",
"Blue line branch": "darkpurple",
"Orange line": "orange",
"Gray line": "beige",
"Green line branch": "lightgreen"
}
Let's visualize the tracks now:
In [15]:
print("The map below shows the line connection of different railway lines, hover over to get more info")
delhi_map_w_line_tooltip = folium.Map(location=[28.7041, 77.1025], zoom_start=11)
for index, row in metro_data.iterrows():
line = row['Line']
color = line_colors.get(line, 'black') # If line name is not found in dictionary, the default is black
folium.Marker(
location=[row['Latitude'], row['Longitude']],
popup = row['Station Name'],
tooltip=f" Station name: {row['Station Name']} \n Line: {line}",
icon=folium.Icon(color=color)
).add_to(delhi_map_w_line_tooltip)
delhi_map_w_line_tooltip
The map below shows the line connection of different railway lines, hover over to get more info
Out[15]:
Make this Notebook Trusted to load map: File -> Trust Notebook
5. The growth of railways over the time:¶
5.1 Adding an extra column for the years:¶
In [16]:
metro_data['year'] = metro_data['Opening Date'].dt.year
In [17]:
metro_data.head(2)
Out[17]:
| Station ID | Station Name | Distance from Start (km) | Line | Opening Date | Station Layout | Latitude | Longitude | year | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | Jhil Mil | 10.3 | Red line | 2008-04-06 | Elevated | 28.67579 | 77.31239 | 2008 |
| 1 | 2 | Welcome [Conn: Red] | 46.8 | Pink line | 2018-10-31 | Elevated | 28.67180 | 77.27756 | 2018 |
In [18]:
num_of_stations_per_year = metro_data['year'].value_counts()
num_of_stations_per_year_df = num_of_stations_per_year.reset_index()
num_of_stations_per_year_df.head(2)
Out[18]:
| index | year | |
|---|---|---|
| 0 | 2018 | 64 |
| 1 | 2010 | 54 |
5.2 Renaming the columns in the new railway count dataframe:¶
In [19]:
num_of_stations_per_year_df.rename(columns={
"index": "Year",
"year": "Number of railway tracks"
}, inplace=True)
In [20]:
fig = px.bar(data_frame=num_of_stations_per_year_df, x='Year', y='Number of railway tracks',
text='Number of railway tracks',
title="Number of metro stations opened each year from 2002 to 2019")
fig.update_layout(yaxis_title = "Number of stations",
xaxis_tickangle=-45, xaxis=dict(tickmode='linear'))
fig.update_traces(textposition='outside')
fig
Some of the years didn't have any new connection, it could be due to various reasons:
- Lack of planning.
- Financial fundings or
- Construction challenges
From the above chart, 2018 has the highest number of railway tracks opened. Now, let's see which and how many tracks were opened in that year.
In [21]:
metro_2018_data = metro_data[metro_data['year'] == 2018][['Station Name', 'Distance from Start (km)', 'Station Layout']]
In [22]:
print(f"In the year 2018, a total of {metro_2018_data['Distance from Start (km)'].sum():.2f} KM distance were covered "
"with the addition of new railway tracks.")
In the year 2018, a total of 1540.90 KM distance were covered with the addition of new railway tracks.
In [23]:
station_layoout = metro_2018_data['Station Layout'].value_counts().reset_index()
In [24]:
station_layoout.rename(columns={
"index": "Station Layout",
"Station Layout": "Total"
}, inplace=True)
In [25]:
fig = px.bar(data_frame=station_layoout, x="Station Layout", y="Total", color="Station Layout",
text="Total", title="Total of different layouts added in the year 2018")
fig.update_traces(textposition='outside', textfont_size=14)
fig
6. Line Analysis:¶
Now, let's analyize the number of lines in terms of the number of stations and the average distance between the stations.
In [26]:
metro_data.head()
Out[26]:
| Station ID | Station Name | Distance from Start (km) | Line | Opening Date | Station Layout | Latitude | Longitude | year | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | Jhil Mil | 10.3 | Red line | 2008-04-06 | Elevated | 28.675790 | 77.312390 | 2008 |
| 1 | 2 | Welcome [Conn: Red] | 46.8 | Pink line | 2018-10-31 | Elevated | 28.671800 | 77.277560 | 2018 |
| 2 | 3 | DLF Phase 3 | 10.0 | Rapid Metro | 2013-11-14 | Elevated | 28.493600 | 77.093500 | 2013 |
| 3 | 4 | Okhla NSIC | 23.8 | Magenta line | 2017-12-25 | Elevated | 28.554483 | 77.264849 | 2017 |
| 4 | 5 | Dwarka Mor | 10.2 | Blue line | 2005-12-30 | Elevated | 28.619320 | 77.033260 | 2005 |
In [27]:
total_distance_per_line = metro_data.groupby("Line")['Distance from Start (km)'].max().reset_index()
total_number_of_lines = metro_data['Line'].value_counts().reset_index()
In [28]:
total_number_of_lines.rename(columns={'index': 'Line', 'Line':'total_stops'}, inplace=True)
In [29]:
distance_per_line_df = pd.merge(total_distance_per_line, total_number_of_lines, on='Line', how='inner')
In [30]:
distance_per_line_df['average_km_per_station'] = distance_per_line_df['Distance from Start (km)'] / (distance_per_line_df['total_stops'] - 1)
The summary of total distance covered,total number of stops,and average distance in KM per station:¶
In [31]:
distance_per_line_df
Out[31]:
| Line | Distance from Start (km) | total_stops | average_km_per_station | |
|---|---|---|---|---|
| 0 | Aqua line | 27.1 | 21 | 1.355000 |
| 1 | Blue line | 52.7 | 49 | 1.097917 |
| 2 | Blue line branch | 8.1 | 8 | 1.157143 |
| 3 | Gray line | 3.9 | 3 | 1.950000 |
| 4 | Green line | 24.8 | 21 | 1.240000 |
| 5 | Green line branch | 2.1 | 3 | 1.050000 |
| 6 | Magenta line | 33.1 | 25 | 1.379167 |
| 7 | Orange line | 20.8 | 6 | 4.160000 |
| 8 | Pink line | 52.6 | 38 | 1.421622 |
| 9 | Rapid Metro | 10.0 | 11 | 1.000000 |
| 10 | Red line | 32.7 | 29 | 1.167857 |
| 11 | Voilet line | 43.5 | 34 | 1.318182 |
| 12 | Yellow line | 45.7 | 37 | 1.269444 |
Now, let's visualize these results:¶
In [32]:
# Creating the sub plots first with 1*2 dimension:
fig = make_subplots(rows=1, cols=2, subplot_titles=("Average distance in KM per stops", "Total number of stops per line"),
horizontal_spacing=0.2)
# Plotting the average distance in KM per stops:
fig.add_trace(go.Bar(y=distance_per_line_df['Line'], x=distance_per_line_df['average_km_per_station'],
name='Average distance in KM per station', marker_color='crimson', orientation='h'),
row=1, col=1)
# Plotting the count of stops in each lines:
fig.add_trace(go.Bar(x=distance_per_line_df['total_stops'], y=distance_per_line_df['Line'],
name='Tota Number of stops in Each Line', marker_color='navy',orientation='h'),
row=1, col=2)
# Updating X and Y axis properties:
fig.update_xaxes(title_text="Average Distance in KM", row=1, col=1)
fig.update_xaxes(title_text="Number of stations", row=1,col=2)
fig.update_yaxes(title_text="Metro Lines", row=1, col=1)
# Update layout:
fig.update_layout(height=550, width=950, title_text="Metro Line Analysis")
fig
7. Station Layout Analysis:¶
In [33]:
station_layout_df = metro_data['Station Layout'].value_counts().reset_index()
station_layout_df
Out[33]:
| index | Station Layout | |
|---|---|---|
| 0 | Elevated | 214 |
| 1 | Underground | 68 |
| 2 | At-Grade | 3 |
In [34]:
fig = px.bar(data_frame=station_layout_df, x='index', y='Station Layout', color='index',
title='Delhi Metro Station Layout Count', text='Station Layout',
labels={'index':'Layouts', 'Station Layout':'Number of stations'})
fig.update_traces(textposition = 'outside')
fig
In [ ]: